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Abstract 

We consider estimation procedures which are recursive in the sense 
that each successive estimator is obtained from the previous one by a 
simple adjustment. We propose a wide class of recursive estimation 
procedures for the general statistical model and study convergence. 
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1 Introduction 

Let Xi, . . . be independent identically distributed (i.i.d.) random vari- 
ables (r.v.'s) with a common distribution function Fq with a real unknown pa- 
rameter 9. An M-estimator of 9 is defined as a statistic 9n = 6'„(Ai, . . . , A„), 
which is a solution w.r.t. v of the estimating equation 



:i.i) 5^v^(x, 



i=l 



where i/j is a suitably chosen function. For example, if ^ is a location param- 
eter in the normal family of distribution functions, the choice ip{x, v) = x — v 
gives the MLE (maximum likelihood estimator). For the same problem, 
if '?/'(x, f) = sign(x — v), the solution of f 1 1.1 1) reduces to the median of 
Ai,...,X„. In general, if f{x,9) is the probability density function (or 
probability function) of Fg{x) (w.r.t. a cr-finite measure /i) then the choice 
'il'{x,v) = f'{x,v)/f{x,v) yields the MLE. 
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Suppose now that Xi, . . . ,X„ are not necessarily independent or identi- 
cally distributed r.v's, with a joint distribution depending on a real param- 
eter 6. Then an M-estimator of 6 is defined as a solution of the estimating 
equation 

71 

(1.2) j2m^) = o, 

i=l 

where ipi{v) = ipi{Xl_j^;v) with Xl_^, = {Xi^k, ■ ■ ■ ,Xi). So, the ^/'-functions 
may now depend on the past observations as well. For instance, if Xj's are 
observations from a discrete time Markov process, then one can assume that 
k = 1. In general, if no restrictions are placed on the dependence structure 
of the process Xj, one may need to consider -i/j-f unctions depending on the 
vector of all past and present observations of the process (that is, k = i — 
If the conditional probability density function (or probability function) of 
the observation Xi, given Xi_k, . . . , Xi_i, is fi{x, 9) = fi{x, 9\Xi_k, Xi_i), 
then one can obtain the MLE on choosing ■ipi{v) = f-{Xi, v) / fi{Xi, v). Besides 
MLEs, the class of M-estimators includes estimators with special properties 
such as robustness. Under certain regularity and ergodicity conditions it can 
be proved that there exists a consistent sequence of solutions of (11.21) which 
has the property of local asymptotic linearity. (See e.g., Serfiing [21], Huber 
[U], Lehman [1^]. A comprehensive bibliography can be found in Launer and 
Wilkinson [12], Hampel at al [7j, Rieder [2]J, and Jureckovd and Sen [lOj.) 

If ^-functions are nonlinear, it is rather difficult to work with the cor- 
responding estimating equations, especially if for every sample size n (when 
new data are acquired), an estimator has to be computed afresh. In this pa- 
per we consider estimation procedures which are recursive in the sense that 
each successive estimator is obtained from the previous one by a simple ad- 
justment. Note that for a linear estimator, e.g., for the sample mean. On = Xn 
we have X„ = {n — l)Xn-i/n + Xn/n, that is On = 9n-i{n — l)/n + Xn/n, 
indicating that the estimator On at each step n can be obtained recursively 
using the estimator at the previous step On-i and the new information Xn- 
Such an exact recursive relation may not hold for nonlinear estimators (see, 
e.g., the case of the median). 

In general, the following heuristic argument can be used to establish a 
possible form of an approximate recursive relation (see also Jureckovd and 
Sen [To], Khas'minskii and Nevelson [11], Lazrieva and Toronjadze [T5]). 
Since On is defined as a root of the estimating equation (11.21) . denoting the 
left hand side of ([II2D by M„(f) we have M„(^„) = and M„_i(^„_i) = 0. 
Assuming that the difference On — On-i is "small" we can write 

= MniOn) - Mn-l{On-l) = M„ + {On - On-l)) ' M„_i(^„_i) 
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= m;(^„_i)(^„-^„_i) + ^„(^„_i). 

Therefore, 

^n{On-l) 



0n ~ 



where M'^{0) = X]r=i '^ii^)- Now, depending on the nature of the underlying 
model, M^{6) can be replaced by a simpler expression. For instance, in i.i.d. 
models with ip{x,v) = f'{x,v)/f{x,v) (the MLE case), by the strong law of 
large numbers, 

= -J2 (fix., 9)/ fix,, 0))' ^ Ee [(/'(Xi, 0)/f{X,, 0))'] = -t{9) 
1=1 

for large n's, where i{6) is the one-step Fisher information. So, in this case, 
one can use the recursion 



(1.3) en = 9n-i + 1 n>l, 

to construct an estimator which is "asymptotically equivalent" to the MLE. 
Motivated by the above argument, we consider a class of estimators 



where ipn is a suitably chosen vector process, r„ is a (possibly random) 
normalizing matrix process and 6q G is some initial value. Note that while 
the main goal is to study recursive procedures with non-linear ip^ functions, 
it is worth mentioning that any linear estimator can be written in the form 
(11.41) with linear, w.r.t. 6, ipn functions. Indeed, if On = X]fc=i ^fc(^fc), 
where and hk{Xk) are matrix and vector processes of suitable dimensions, 
then (see Section 4.2 for details) 

On = On-i + r^^ (^hn{Xn) — (r„ — r„_i)^„_] 



1 ' 



which is obviously of the form (11.41) with ipn{0) = hn{Xn) — (r„ — r„_i)0. 

It should be noted that at first glance, recursions (11.31) and (11.41) resemble 
the Newton-Raphson iterative procedure of numerical optimisation. In the 
i.i.d. case, the Newton-Raphson iteration for the likelihood equation is 

(1.5) + tfgl^. ^>1. 
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where J{v) is minus the second logarithmic derivative of the log-hkehhood 
function, that is, — XlILi ^ (/'("^«' ^)) expectation, that is, 

the information matrix ni{v). In the latter case, the iterative scheme is often 
called the method of scoring, see e.g., Harvey |8]. (We do not consider the 
so called one-step Newton-Raphson method since it requires an auxiliary 
consistent estimator). The main feature of the scheme (II. 5p is that "t^k, at 
each step = 1, 2, . . . , is cr{Xi, . . . , X„) - measurable (where cr(Xi, . . . , X„) is 
the (T-field generated by the random variables Xi, . . . , In other words, 
(11.51) is a deterministic procedure to find a root, say 6n, of the likelihood 
equation 

^^^j^ (/'(Xj, t>)//(Xj, t>)) = 0. On the other hand the random variable 6'„ 
derived from (11.31) is an estimator of 9 for each n=l,2,. . . (is <j{Xi, . . . , X„)- 
measurable at each n). Note also that in the iid case, (II. 3p can be regarded 
as a stochastic iterative scheme, i.e., a classical stochastic approximation 
procedure, to detect the root of an unknown function when the latter can only 
be observed with random errors (see Remark 3.1). A theoretical implication 
of this is that by studying the procedures (II. 3p . or in general (11.40 . we study 
asymptotic behaviour of the estimator of the unknown parameter. As far 
as applications are concerned, there are several advantages in using (11.41) . 
Firstly, these procedures are easy to use since each successive estimator is 
obtained from the previous one by a simple adjustment and without storing 
all the data unnecessarily. This is especially convenient when the data come 
sequentially. Another potential benefit of using (11.41) is that it allows one 
to monitor and detect certain changes in probabilistic characteristics of the 
underlying process such as change of the value of the unknown parameter. 
So, there may be a benefit in using these procedures in linear cases as well. 

In i.i.d. models, estimating procedures similar to (11.40 have been studied 
by a number of authors using methods of stochastic approximation theory 
(see, e.g., Khas'minskii and Nevelson [H], Fabian [1], Ljung and Soderstrom 
|19j . Ljung et al [TS], and references therein). Some work has been done for 
non i.i.d. models as well. In particular, Englund et al give an asymp- 
totic representation results for certain type of X„ processes. In Sharia [25] 
theoretical results on convergence, rate of convergence and the asymptotic 
representation are given under certain regularity and ergodicity assumptions 
on the model, in the one- dimensional case with '?/'„(x, 9) = ^log/„(x, 9) (see 
also Campbell (2], Sharia [26] and Lazrieva et al |13j). 

In the present paper, we study multidimensional estimation procedures 
of type (11.40 for the general statistical model. Section 2 introduces the basic 
model, objects and notation. In Section 3, imposing "global" restrictions 
on the processes ijj and F, we study "global" convergence of the recursive 
estimators, that is the convergence for an arbitrary starting point 9o- In 
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Section 4, we demonstrate the use of these results on some examples. (Results 
on rate of convergence, asymptotic linearity and efficiency, and numerical 
simulations will appear in subsequent publications, see Sharia [27], [28].) 

2 Basic model, notation and preliminaries 

Let Xt, t = 1, 2, . . . , be observations taking values in a measurable space 
(X, B(X.)) equipped with a cr-finite measure /i. Suppose that the distribution 
of the process depends on an unknown parameter 6' G 0, where G is an 
open subset of the m-dimensional Euclidean space M™. Suppose also that for 
each t = 1,2, ... , there exists a regular conditional probability density of Xt 
given values of past observations of Xt-i, . . . , X2, Xi, which will be denoted 
by 

fti9, Xt I x{ ^) = fti9, Xt I xt-i, ...,xi), 

where fi{9,Xi \ x?) = fi{9,Xi) is the probability density of the random 
variable Xi. Without loss of generality we assume that all random variables 
are defined on a probability space {Q,J^) and denote by {P^ 9 e Q] the 
family of the corresponding distributions on (f2,jF). 

Let J-'t = cr{Xi, . . . , Xt) be the cr-field generated by the random variables 
Xi, . . . ,Xf. By (M*", 0(1^"^)) we denote the m-dimensional Euclidean space 
with the Borel cx-algebra i3(M'"). Transposition of matrices and vectors is 
denoted by T. By (m, v) we denote the standard scalar product of u,v E MJ^, 
that is, {u, v) = u^v. 

Suppose that /i is a real valued function defined on C M™. We de- 
note by h{9) the row- vector of partial derivatives of h{9) with respect to the 
components of 9, that is. 

Also we denote by h(^) the matrix of second partial derivatives. The m x m 
identity matrix is denoted by 1. 

If for each t = 1,2, . . . , the derivative ft{9, Xt \ x*^"*^) w.r.t. 9 exists, then 
we can define the function 

lti9,xt I x\-') = ^ t-iJ Ii(^^^t I 4"') 

jt\p, Xt \ X-^^ ) 

with the convention 0/0 = 0. 

The one step conditional Fisher information matrix for t = 1,2,... is 
defined as 

h{9 I x\-') = [ lt{9,z\ x\-')lf{9,z I x\-')ft{9,z\ x{-')fi{dz). 
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We shall use the notation 



m = ft{e,Xt\xl-') 



k{9) = k{9,Xt\Xi-'), 



H{e) = Hie\xl-'). 



Note that the process it{d) is "predictable" , that is, the random variable it{0), 
is J-'t-i measurable for each t > 1. 

Note also that by definition, it{0) is a version of the conditional expecta- 
tion w.r.t. J-'t-i, that is. 



Everywhere in the present work conditional expectations are meant to be 
calculated as integrals w.r.t. the conditional probability densities. 
The conditional Fisher information at time t is 



If the XfS are independent random variables, It{9) reduces to the standard 
Fisher information matrix. Sometimes lt{0) is referred as the incremental 
expected Fisher information. Detailed discussion of this concept and related 
work appears in Barndorff-Nielsen and Sorensen [IJ, Prakasa-Rao pU] Ch.3, 
and Hall and Heyde [6j. 

We say that ip = {ipt{Q,Xt,Xt-i, . . . ,a;i)}(>i is a sequence of estimating 
functions and write -i/^ G ^, if for each t > 1, ipt{0,Xt,Xt-i, . . ■ ,Xi) : x 
X* — s> R"* is a Borel function. 

Let G * and denote iptid) = ilJt{6, Xt, Xt-i, . . . , Xi). We write ip G 
if 'iptid) is a martingale-difference process for each 6' G 6, i.e., if 
Ee {iptiO) I J^t-i} = for each t = 1,2,... (we assume that the conditional 
expectations above are well-defined and jFg is the trivial cr-algebra). 

Note that if differentiation of the equation 



is allowed under the integral sign, then {lt{9,Xt \ X^ ^)}t>i G 

Convention Everywhere in the present work 9 G M™ is an arbitrary but 
fixed value of the parameter. Convergence and all relations between random 
variables are meant with probability one w.r.t. the measure unless spec- 
ified otherwise. A sequence of random variables {^t)t>i has some property 
eventually if for every uj in a set Vt^ of probability 1, has this property 
for all t greater than some to{uj) < oo. 



it{9) = Ee{lti9)lfi9)\J't-i}. 



t = l,2,.... 



1 
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3 Main results 

Suppose that ip E ^ and Tt{9), for each 9 G M™, is a predictable m x m 
matrix process with det r((^) 7^ 0, t > 1. Consider the estimator 6t defined 
by 

(3.1) et = et_, + r,-i(0Vi)^t(^t-i), t > i, 

where ^0 ^ I^™" is an arbitrary initial point. 

Let 6 G be an arbitrary but fixed value of the parameter and for any 
n G M"" define 

bt{e,u) = Ee{Me + u) iJ't-i}, t>l. 
Theorem 3.1 Suppose that 

(CI) u^T;\e + u)bt{e, u) <0 for each u ^ 0, P^-a.s^ 

(C2) for each e e (0,1), 

00 

y inf \u^r;\e + u)bt{e,u)\ = 00, P^-a.s.-, 

t=l ' 

(C3) there exists a predictable scalar process {Bf)t>i such that 

Eg{\\T^\9 + u)M0 + u)f I J't-i} < + Ikf) 

for each u G M"^, P^-a.s., and 

00 

^Sf<oo, P^-a.s.. 
t=i 

Then 6t is strongly consistent (i.e., 9t 9 P^-a.s.) for any initial value 9q . 

We will derive this theorem from a more general result (see the end of the 
section). Let us first comment on the conditions used here. 



^Note that the set of probabihty where the inequahties in (CI) and (C3) are not 
vahd should not depend on u. 
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Remark 3.1 Conditions (CI), (C2), and (C3) are natural analogues of the 
corresponding assumptions in theory of stochastic approximation. Indeed, 
let us consider the i.i.d. case with 

mz I x{-') = fie,z), Me) = ^(^,^)|.=x„ 

where / ipi^O, z)f{9, z)^{dz) = and Tt{0) = ^7(6*) for some invertible non- 
random matrix '~f{6). Then 

btie,u) = bie,u) = J ^ie + u,z)fie,z)fiidz), 

implying that b{6, 0) = 0. Denote At = 9t — and rewrite (13.11) in the form 

(3.2) At = At-i + i (7-'(^^ + At-i)b{9, At-i) + el) , 

where 

e't = i-\e + At-i) {iij{e + At^i,Xt) - b{e, At-i)} . 

Equation (13. 2p defines a Robbins- Monro stochastic approximation procedure 
that converges to the solution of the equation 

R\u) -.= -^'^6 + u)b{e,u) =0, 

when the values of the function R^{u) can only be observed with zero expec- 
tation errors e^. Note that in general, recursion (13.11) cannot be considered in 
the framework of classical stochastic approximation theory (see Lazrieva et al 
|13j . [H] for the generalized Robbins-Monro stochastic approximations pro- 
cedures). For the i.i.d. case, conditions (CI), (C2) and (C3) can be written 
as (I) and (II) in Section 4, which are standard assumptions for stochastic 
approximation procedures of type (13.21) (see, e.g., Robbins and Monro ^22], 
Gladyshev [S], Khas'minskii and Nevelson pjj, Ljung and Soderstrom [TU] . 
Ljung et al [IE]). 

Remark 3.2 To understand how the procedure works, consider the one- 
dimensional case, denote At = 6t — 6 and rewrite (13. ip in the form 

At = At-i + T;\e + At-i)M0 + Ai-i). 

Then, 

Ee {Ot - Ot-i I J't-i) = Ee {At - At-, \ Tt-i} = Ti\e + At-MO, At-,). 
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Suppose now that at time t — 1, 9t^i < 9, that is, Ai_i < 0. Then, by (CI), 
r;\9 + At-i)bt{9,At^^) > implying that Ee {^^ - 9t_i \ > 0. So, 

the next step 9t will be in the direction of 9. If at time t — 1, 9t^i > 9, by 
the same reason, Eq ^^9t — 9t-i \ J^t-i^ < 0. So, the condition (CI) ensures 
that, on average, at each step the procedure moves towards 9. However, 
the magnitude of the jumps 9t — 9t-i should decrease, for otherwise, 9t may 
oscillate around 9 without approaching it. This is guaranteed by (C3). On 
the other hand, (C2) ensures that the jumps do not decrease too rapidly to 
avoid failure of 9t to reach 9. 

Now, let us consider a maximum likelihood type recursive estimator 

9t = 9t_,+l;\9t_^)k{9t^^), t>l, 

where lt{9) = f^{9,Xt \ X{-^)/ft{9,Xt \ X{-^) and It{9) is the conditional 
Fisher information with det/t(^) 7^ (see also (11. 3p for the i.i.d. case). By 
Theorem 3.1, 9t is strongly consistent if conditions (CI), (C2) and (C3) are 
satisfied with lt{9) and It{9) replacing ■ipt{9) and Tt{9) respectively. On the 
other hand, if e.g., in the one- dimensional case, bt{9,u) is differentiable at 
u = and the differentiation is allowed under the integral sign, then 

g:^h{9,u) \u=o= Ee {m \ J^t-i} . 

So, if the differentiation w.r.t. 9 of Eg {lt{9) \ J-'t~i} = is allowed under 
the integral sign, -^bt{9,u) \u=o= ~hi9) implying that (CI) always holds for 
small values of m 7^ 0. 

Condition (C2) in the i.i.d. case is a requirement that the function 7~^(^ + 
u)b{9,u) is separated from zero on each finite interval that does not contain 
0. For the i.i.d. case with continuous w.r.t u functions b{9,u) and i{9 + u), 
condition (C2) is an easy consequence of (CI). 

Condition (C3) is a boundedness type assumption which restricts the growth 
of ipt{9) w.r.t. 9 with certain uniformity w.r.t. t. 

We denote by r^"*" (respectively ri~) the positive (respectively negative) 
part of rj. 

Theorem 3.2 Suppose that for 9 G M™ there exists a real valued nonnegative 
function Ve{u) : R'" — > M having continuous and bounded partial second 
derivatives and 
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(Gl) Ve{0) = 0, and for each e e (0, 1), 

inf Ve{u) > 0; 

||m||>£ 

(G2) there exists a set A E J-' with P^{A) > such that for each e G (0, 1), 

oo 

inf Wt{u)r = oo 

j^e<Ve{u)<l/e' 

on A, where 

Mt{u) = VB{u)v-\e + u)ht{e,u) 

+ ]- sup II Ve{v)\\Ee { \\V-\e + u)i^t{e + w)f | Tt^^] , 



(G3) forAt = et~e, 

CO 

^(1 + VeiAt-i))-' [M(Ai_i)]+ < oo, P'-a.s.. 



t=i 



Then 6t 6 {P^-a.s.) for any initial value 9q. 

Proof. As always (see the convention in Section 2), convergence and all 
relations between random variables are meant with probability one w.r.t. 
the measure unless specified otherwise. Rewrite (13.11) in the form 

Ai = Ai_i + V-\e + At-i)Md + Ai-i). 
By the Taylor expansion, 

where A^ G M™". Taking the conditional expectation w.r.t. J^t-i yields 

Ee{Ve{At) I :Ft-i] < F,(Ai_i) + M(Ai_i). 

Using the obvious decomposition A/t(A(_i) = [A/t(At_i)]^ — [A/t(A(_i)] , the 
previous inequality can be rewritten as 

(3.3) Ee {Ve{At) \ Tt-i} < F,(Ai_i)(l + Bt) + Bt - |M(Ai_i)]-, 
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where 

By condition (G3), 

oo 

(3.4) J]5t<oo. 

t=i 

According to Lemma Al in Appendix A (with X„ = Vg{An), = ^n~i = 
En and = [A/'„(A„^i)]~), inequahties (13.31) and (13.41) imply that the 
processes V6i(Ai) and 

s=l 

converge to some finite hmits. It therefore follows that Vg{At) r > 0. 
Suppose that {r > 0}. Then there exists e > such that e < Vg{At) < 1/e 
eventually. Because of (G2), this implies that for some (possibly random) to, 

oo oo 

"~ £<Ve(u)<l/£ 

S=to S=to 

on the set A with P^{A) > 0, which contradicts the existence of a finite limit 
of Yj. Hence, r = and so, Ve(At) — >■ 0. Now, — follows from (Gl) 
(otherwise there would exist a sequence tk ^ oo such that \\AtJ\ > e for 
some £ > 0, and (Gl) would imply that inf^ V6i(A(^) > 0). <) 

Proof of Theorem 3.1. As always (see the convention in Section 2), con- 
vergence and all relations between random variables are meant with prob- 
ability one w.r.t. the measure unless specified otherwise. Let us show 
that the conditions of Theorem 3.1 imply those in Theorem 3.2 with Vg{u) = 
{u,u) = u^u = Condition (Gl) trivially holds. Since Vg{u) = 2v7' and 

Ve('u) = 2 X 1, it follows that 

(3.5) Ut{u) = 2u^V-\e + u)ht{e, u) + Ee{ \\V-\e + u)^IJt{e + M)f I J^t-i] . 
Then, by (Gl) and (G3), 

oo 

5^(l + ||A,_if)-MAr,(A,_i)]^ 
t=i 

oo 

< 5^(1 + \\A,_4^)-^Eg{\\Vi\e + A,_,)Md + ^t-if I J't-i] 
t=i 

oo 

(3.6) <J2Bt<oo. 
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So, (G3) holds. To derive (G2), using the obvious inequahty [a] > —a and 
(CI), we write 

M[Aft{u)r > mi[-2u^T-\e + u)bt{e,u) 

-Ee {\\T-\e + u)iJt{e + u)f I :Ft-i]] 

> mi\2u^T;\e + u)bt{e,u)\ 

- sup [Ee { \\Ti\e + u)iJt{0 + n)f | J^t-i}] , 

where inf's and sup's are taken over {u : e < < From (C3), 

sup [Ee{\\T;\e + u)MO + w)f I J't-i}] < Bt{l + 

and Ylt^i < oo. Now, using (C2), we finally obtain 

oo oo oo 

^ inf [Mt{u)\- > J2 inf |2M^r-i(0 + u)bt{e, n) | - (1 + l/e^) J] 5< = oo, 
t=i t=i t=i 

which implies (G2). So, Theorem 3.1 follows on application of Theorem 3.2. 



Remark 3.3 It follows from the proof of Theorem 3.2 that if conditions 
(Gl) and (G3) are satisfied then {9t — 9^ converges (P^-a.s.) to a finite 
limit, for any initial value ^o- In particular, to guarantee this convergence, 
it suffices to require conditions (CI) and (C3) of Theorem 3.1 (this can be 
seen by taking Ve('u) = {u,u) = u^u = H-up and (13. 6p ). 

4 SPECIAL MODELS AND EXAMPLES 

4.1 The i.i.d. scheme. 

Consider the classical scheme of i.i.d. observations Xi,X2, . . . , with a com- 
mon probability density/mass function f{9,x), 9 G M"^. Suppose that 
ip{9, z) is an estimating function with 

Let us define the recursive estimator 9t by 

(4.1) 9t = 9t-i + -^r'idt-MiOt-uXt), t > 1, 

where ^{9) is a non-random matrix such that ^'^{9) exists for any 9 G M"* 
and ^0 £ I^"* is any initial value. 
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Corollary 4.1 Suppose that for any 6 G M™, the following conditions hold. 

(I) For any < e < 1, 

sup 7"^(6' + u) ipi^O + u,x)f{9,x)fi{dx) < 0. 

e<||it||<i J 

(II) For each ueW, 

J \\^-\9 + u)ij{e + u,x)\\'^ f{9,x)fx{dx) < Kg{l + ||nf ) 

for some constant Kg. 
Then the estimator 9t is strongly consistent for any initial value 9q . 

Proof Since bt{9,u) = b{9,u) = J '^{9+u, z)f{9, z)fi{dz) andTt{9) = t-f{9), 
it is easy to see that (I) and (II) imply (CI), (C2) and (C3) from Theorem 

3.1 which yields {9t -9)^0 (P^-a.s.). 

Similar results (for i.i.d. schemes) were obtained by Khas'minskii and 
Nevelson [TT] Ch.8, §4, and Fabian [1]. Note that conditions (I) and (II) are 
derived from Theorem 3.1 and are sufficient conditions for the convergence 
of fl4.ll) . Applying Theorem 3.2 to (14. ip . one can obtain various alternative 
sufficient conditions analogous to those given in Fabian (1978). Note also 
that, in (4.1), the normalising sequence is Tt{9) = ^7(6'), but Theorems 3.1 
and 4.1 allow to consider procedures with arbitrary predictable ^t{9)- 

4.2 Linear procedures. 

Consider the recursion 

(4.2) 0^ = 0^_^+y;' (ht--ft9n-i) , t>l, 

where the Ff and •jt are predictable processes, hf is an adapted process (i.e., ht 
is jFj-measurable for t > 1) and all three are independent of 9. The following 
result gives a sets of sufficient conditions for the convergence of (14.21) in the 
case when the linear ipt{9) = ht — 'yt9 is a martingale-difference. 

Corollary 4.2 Suppose that for any 9 eM., 

(a) Ee {ht I J^t-i} = lt9, for t > 1, P'-a.s., 
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(b) < 7t/rt <2 — 5 eventually for some 5 > 0, and 

oo 

^7t/ri = oo, 

t=l 

on a set A of positive probability . 

f ^"{"'■-yi^-''<oo, P'.a.s.. 



t=l 



Then 9t ^ 9 (P^-a.s.) for any initial value Oq eM. . 

Proof. We need to check that the conditions of Theorem 3.2 hold for for 
Vg{u) = V?. Using (a) we obtain 

ht{6, u) = Eg {{ht - (0 + u)-it) I :Ft-i] = -u-ft 

and 

Eg {(MO + U)Y\ J't-l} = Eg {{ht -{9 + I J't-l} 

= Eg {{ht - e^tf I j^t-i] + uWt = v! + uWt. 

where Pf = Eg {{ht - O^it? \ J^t-i} ■ Now, using (M . 

Nt{u) = -2«%r-i + TfVt + uWt^t^ 

= -6u\T~' - u\t;' ((2 - 5) - 7*rr^) + t;'v!. 

To derive (G2), we use the obvious inequahty [a]~ > —a (for any a), condi- 
tions (b) and (c), and write 



on A. To check (G3) we write 

oo oo oo 

5^(1 + Aur' mAt-^r < E < E ^t'r! < oo 

t=i t=i t=i 

(P^-a.s.), which completes the proof. (} 
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Remark 4.1 Suppose that AF^ = 7^. Then 

(4.3) et = T-^ (^o + Y.h,{Xs)^ . 

This can be easily seen by inspecting the difference 9t — 9t-i for the sequence 
(14.31) ■ to check that fl4.2l) holds. It is also interesting to observe that since in 
this case, Vt = Yll=ils, 

t 

s=l 

where, = X]l=i (^s(-^s) ~lsd) is a martingale. Now, if 00, a 

necessary and sufficient condition for the convergence to 6 is the convergence 
to zero of the sequence Tj^Mf . Condition (c) in Corollary 4.2 is a standard 
sufficient condition in martingale theory to guarantee Tj^Mf — >■ (see e.g., 
Shiryayev [29j, Ch.VII, §5 Theorem 4). The first part of (b) will trivially 
hold if 7i = AFf > 0. Also, in this case, Fj — >• 00 implies Ylu=i ^'^tl'^t = 00 
(see Proposition A3 in Appendix A). 

Remark 4.2 As a particular example, consider the process 

Xt = ext-i + it, t>i, 

where, S,t is a martingale- difference with Dt = Eg {^"^ \ J-'t-^i} > 0. The 
choice ht = D^^Xt-iXt and AF^ = 7* = D^^^X'^_^, in (jJ^D yields the least 
square estimator of 6. It is easy to verify that (a) holds. Also, since 

EeiiK - ^t0f I J't-i} = D-'Xl,Ee {^^ \ J-^-i} = D;'Xl, = AF*, 

it follows that (c) in Corollary 4.2 is equivalent to Yl'tli ^^t/^t < This, 
as well as (b) hold if F^ — > cxd (see Proposition A3 in Appendix A). So, if 
Tt ^ 00 the least square procedure is strongly consistent. If, e.g., are i.i.d. 
r.v.'s, then Fi ^ 00 for all values of G M (see, e.g, Shiryayev [29], Ch.VII, 
5.5). 

4.3 AR(m) process 

Consider an AR(m) process 

Xi = OiXi^i + • • ■ + OmXi^m + = ^i-ln + ^i; 
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where X^Z^ = (-^i-i; • • • ? -^i-m)"^, 6' = (6'i, . . . , 6'^)"^ and is a sequence of 
i.i.d. random variables. 

A reasonable class of procedures in this model should have a form 

(4.4) et = e^-i + T-\e,,,)MXt - el^x'-'j, 

where ipt{z) and T'^'^iz) {z G M™') are respectively vector and matrix processes 
meeting conditions of the previous section. Suppose that the probability 
density function of w.r.t. Lebesgue's measure is g{x). Then the conditional 
probability density function is ft{0, Xt \ a;*"^) = gixt — 6'^x\zln)- So, denoting 

(4.5) M^) = 
it is easy to see that 



mJ 





%x. 


1 A-') 


m 


\Xt\ 


x{-^) 



and (14. 4p becomes a likelihood recursive procedure. A possible choice of (2;) 
in this case would be the conditional Fisher information matrix 



h = ^^^X*_^(X*_^ 

where 



-1^T 
mJ 

s=l 



An interesting class of recursive estimators for strongly stationary AR(m) 
processes is studied in Campbell [2]. These estimators are recursive versions 
of robust modifications of the least squares method and are defined as 

(4.6) et = + aaixlz'j<i>iXt - el.xlz'J, 

where at is a sequence of a positive numbers with at —>■ 0, (f) is a bounded 
scalar function and 7(u) is a vector function of the form uh{u) for some non- 
negative function h of u (See also Leonov PZlj). The class of procedures of 
type (14.61) is clearly a subclass of that defined by (14.41) and therefore can be 
studies using the results of the previous section. 

Suppose that are i.i.d. random variables with a bell-shaped, symmetric 
about zero probability density function g{z) (that is, g{—z) = g{z), and g I 
on M+). Suppose also that 0(x) is an odd, continuous in zero function. Let 
us write conditions of Theorem 3.1 for 

(4.7) Tie) = aZ'l and ^^(0) = (X*!;^) (X, - . 
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We have 

Ee {0 (X, -{e + ufXtl) I = Ee {0 (6 - Xtl) \ ^.-i} 

(t){z- u^Xlzln) g{z)dz. 

It follows from Lemma A2 in Appendix A that if w 7^ 0, 

/oo 
(j){z — w) g{z)dz > 0. 
-00 

Therefore, 

u 



^T;\9 + uM9,u) = a^u'Xlzlh{Xlzl)Ee {0(6 - «^X*r^) I ^.-1} 



(4.8) = -a, h [XtD Giu^Xiz'J < 0. 
Also, since is a bounded function, 

Ee {\\Tz\e + u)MO + u)r I < C'aUxtirh'iXt'J 

for some positive constant C^. Therefore, conditions of Theorem 3.1 hold if 
(P^-a.s.), 

00 

(4.9) J2 i^tZL) ^ inf ^ Giu^Xlz'J = 00 

e< m <1/e 

and 

00 

(4.10) J2^nxtLrh\xt'j<^- 

t=i 

If Xt is a stationary process, these conditions can be verified using limit 
theorems for stationary processes. Suppose, e.g., that at = 1/t, /i(x) 7^ 
for any x 7^ 0, and g{z) is continuous. Then /i(x) inf£<||„||<i/e G(u^x) > 
for any x 7^ (see Appendix A, Lemma A2). Therefore, it follows from an 
ergodic theorem for stationary processes that in probability P^, 

1 * 

(4.11) hm - J2 h (Xr^) Jni ^ Giu'Xlzl) > 0. 

t^oo t ^ — ^ £<\M\<l/e 
s=l 

Now, (14.91) follows from Proposition A4, in Appendix A. 
Examples of the procedures of type (14. 6 p as well as some simulation results 
are presented in Campbell p]. 
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4.4 An explicit example 

As a particular example of (4.4), consider the process 

Xt = ext-i + it, t>i, 

where, ^t, t > 1, are independent Student random variables with degrees 
of freedom a. So, the probability density functions of is 



g{x) = 1 + 



9\ 



a 



where C„ = r((a + l)/2)/(0F^ r(a/2)). 
Since 

— -[a + lj 



g{z) a + z"^ 

see also fl4.5p ). 



and the conditional Fisher information is 

t 



where 



— Gq,- 



2 X g+l 
2 



7 (l + 22)2±5 

(a + l)2 0Fr((a + 5)/2-3/2) 
" 2r((a + 5)/2) 

2(a + 1) 



a + 3 

Therefore, a likelihood recursive procedure is 



(4.12) e. = ^*-i + /r^(^*-i)(a + l)X,_i^^^^^^i^^, t>l. 
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where is any starting point. Note that It can also be derived recursively 
by 

Clearly, (14.121) is a recursive procedure of type (14. 6 p but with a stochastic 
normalizing sequence at = If^. Now, -ipt is of a form of (14. 7p with h(u) = 1 
and (j){z) = {a + l)z/{a + z^), and g{z) is a bell-shaped and symmetric about 
zero. Therefore, to show convergence to 6', it suffices to check conditions (14. 9 p 
and (14.101) . which, in this case can be written as 



oo ^ 

(4.13) V - inf GiuXt-i) = oo 

^ ^ Ite<\u\<l/e ^ ^' 



and 



(4.14) 



< OO, 



t=l 



(P^-a.s.). We have. It ^ oo for any G M (see, e.g, Shiryayev [29j, Ch.VII, 
5.5). Since Alt = 'i^ {Xt-i)"^ , we obtain that (I4.14p follows from Proposition 
A3 in Appendix A. Let us assume now that \6\ < 1. By Lemma A2 in 
Appendix A, inf£<|„|<i/£ G{ux) > for any x ^ 0. Then if we assume that 
the the process is strongly stationary, it follows from the ergodic theorem 
that in probability P^, 

1 1 * 

hm -It > and hm - inf G(uXs-i) > 0. 

t->oo t t-*oo t e<\u\<l/£ 

s=l ' 

(It can be proved that these hold without assumption of strong stationar- 
ity.) Therefore, in probability P^, lim j-^ 'Y^s=i iiife<|«|<i/e G{uXs-i) > and 
(I4.13P now follows on application of Proposition A4 in Appendix A. 

Remark 4.3 We have shown above that the recursive estimator (I4.12p is 
strongly consistent, i.e., converges to 6 a.s., if |^| < 1. It is worth mentioning 
that (I4.14p . and therefore, (I4.10p holds for any G M, which guarantees 
(C3) of Theorem 3.1. Also, (14. 8 p implies that (CI) of Theorem 3.1 holds as 
well. Therefore, according to Remark 3.3, we obtain that \6t — 6\ converges 
(P^-a.s.) to a finite limit for any 6' G M. 



Remark 4.4 Note that conditions (I4.13P and (I4.14p will still hold if we 
replace It by ctit where q is a sequence of non-negative r.v.'s such that 
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5 10 15 20 25 30 35 40 

Figure 1: Realisations of ()4.12p for a = 3 and 6 = 0.5 for three different starting 
values ^0 = —0.2, 0, 1 and 0.7. The number of observations is 40. 

Ct = 1 eventually. So, the procedure (14.121) will remain consistent if It is 
replaced by Cth, i.e., if tuning constants are introduced. We have shown that 
the procedure is consistent, i.e., the recursive estimator is close to the value of 
the unknown parameter for the large t's. But in practice, the tuning constants 
may be useful to control the behaviour of a recursion at the "beginning" of 
the procedure. Fig.l shows realisations of (14.121) for a = 3 and 9 = 0.5 for 
three different starting values. The number of observations is 40. As we 
can see from these graphs, the recursive procedure, at each step moves in the 
direction of the parameter (see also Remark 3.2), but oscillates quite violently 
for the first ten steps and then settles down nicely after another ten steps. 
This oscillation is due to the small values of the normalising sequence for 
the first several steps and can be dealt with by introducing tuning constants. 
On other occasions, it may be desirable to lower the value of the normalising 
sequence for the first several steps. This happens when a procedure settles 
down too quickly without any, or little oscillation (before reaching the actual 
value of the parameter). The detailed discussion of these and related topics 
will appear elsewhere. 

APPENDIX A 

Lemma Al Let JFq, JF^, ... be a non- decreasing sequence of a -algebras and 
Xn,(3n,^n, Cn ^ ^n, ^ > 0, are nonncgativc r.v. 's such that 

E{Xn\Tn-l) < X„„i(l + /3„_i) + e„-l - Cn-1, U > 1 
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eventually. Then 



< 00} n {J^ A-i < 00} c {X ^} n {^O-i < 00} (P-a.s.), 

1=1 i=l i=l 

where {X — >} denotes the set where hm„_+oo-^n exists and is finite. 

Remark Proof can be found in Robbins and Siegmund [23]. Note also that 
this lemma is a special case of the theorem on the convergence sets nonneg- 
ative semimartingales (see, e.g., Lazrieva et al [T3]). 

Lemma A2 Suppose that g ^ is a nonnegative even function on M and 
g I on Suppose also that (p is a measurable odd function on M such 
that (f){z) > for z > and \(f){z — w)\g{z)dz < oo for all w eM.. Then 



(Al) w (f){z -w)g{z)dz <0 

J — oo 

for any w Furthermore, if g{z) is continuous, then for any e G (0, 1) 



{A2) sup w / (j) {z — w) g{z)dz < {i. 

£<\w\<l/e J —oo 

Proof Denote 

/oo /"OO 
(piz - w) g{z)dz = / (j){z)g{z + w)dz. 
-oo J — oo 

Using the change of variable z < — ^ —z in the integral over (— oo, 0) and the 
equalities (p{—z) = —4>{z) and g{—z + w) = g{z — w), we obtain 



-oo 
oo 



^{w) = I (f){z)g{z + w)dz + J (f){z)g{z + w)dz 
[z) {g{z + w)- g{-z + w)) dz 
[z) {g{z + w) - g{z - w))dz. 



Suppose now that w > 0. Then z — w is closer to than z + w, and the 
properties of g imply that g{z + w) — g{z — w) < 0. Since (f){z) > for z > 0, 
$(w) < 0. The equality $(w) = would imply that g{z + w)— g{z — w) = 
for all z G (0, +oo) since, being monotone, g has right and left limits at each 
point of (0, +oo). The last equality, however, contradicts the restrictions on 
g. Therefore, (Al) holds. Similarly, if w < 0, then z + w is closer to than 
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z — and g{z + w) — g{z — w) > 0. Hence w {g{z + w) — g{z — w)) < 0, 
which yields (Al) as before. 

To prove (A2) note that the continuity of g imphes that g{z+w) —g{z—w) 
is a continuous functions of w and (A2) will follow from (Al) if one proves 
that is also continuous in w. So, it is sufficient to show that the 

integral in (A3) is uniformly convergent for e < \w\ < 1/e. It follows from 
the restrictions we have placed on g that there exists S > such that g > S 
in a neighbourhood of 0. Then the condition 

POO 

(f){z) {g{z + w) + g{z — w)) dz = / \(j){z — w)\g{z)dz < oo, Vw G R 



J-oo 

implies that is locally integrable on M. It is easy to see that, for any 
£G (0,1), 

g{z±w)<giO)xeiz)+giz-l/e), z>0, e<\w\<l/e, 

where Xe is the indicator function of the interval [0, 1/e]. Since the function 
0(') (5'(0)Xe + g{' ~ 1/^)) is integrable on (0, +oo) and does not depend on 
w, we conclude that the integral in (A3) is indeed uniformly convergent for 
e <\w\< 1/e. 

Proposition A3 If dn is a nondecreasing sequence of positive numbers such 
that dn +00, then 



^dn/dn = +00 



n=l 



and 



"^Adn/dl < +00. 



n=l 



Proof The first claim is easily obtained by contradiction from the Kronecker 
lemma (see, e.g.. Lemma 2, §3, Ch. IV in Shiryayev [29j). The second one is 
proved by the following argument 



Af . , iV A , N 



Ad„ ^ Adr, ^ / 1 1\ 1 1 1 

< +00. 



EAdn \ ^ Adr, \ ( 1 
<7 ^=> — 
d\ ^ dn-idn ^ \dn~i dn J do dN do 

n=l " n=l n=l ^ ^ 







Proposition A4 Suppose that dn, Cn, and c are random variables, such that, 
with probability 1, dn > 0, c„ > 0, c > and dn —>■ +oo as n ^ oo. Then 



1 

— Cj — > c in probability 



d,, . 
1=1 
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implies 

oo 

-^ = 00 with probability 1. 

n=l 

Proof Denote = ^ Yl^=i ^i- Since ^„ — c in probability, it follows that 
there exists a subsequence of ^„ with the property that c with 

probability 1. Now, assume that Yl'^=i^n/dn < C)0 on a set A of positive 
probability. Then, it follows from the Kronecker lemma, (see, e.g.. Lemma 2, 
§3, Ch. IV in Shiryayev ^29j) that ^„ — on A. Then it follows that C,i„ — > 
on A as well, implying that c = on A which contradicts the assumptions 
that c > with probability 1. <) 
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